536 research outputs found
Equi-energy sampler with applications in statistical inference and statistical mechanics
We introduce a new sampling algorithm, the equi-energy sampler, for efficient
statistical sampling and estimation. Complementary to the widely used
temperature-domain methods, the equi-energy sampler, utilizing the
temperature--energy duality, targets the energy directly. The focus on the
energy function not only facilitates efficient sampling, but also provides a
powerful means for statistical estimation, for example, the calculation of the
density of states and microcanonical averages in statistical mechanics. The
equi-energy sampler is applied to a variety of problems, including exponential
regression in statistics, motif sampling in computational biology and protein
folding in biophysics.Comment: This paper discussed in: [math.ST/0611217], [math.ST/0611219],
[math.ST/0611221], [math.ST/0611222]. Rejoinder in [math.ST/0611224].
Published at http://dx.doi.org/10.1214/009053606000000515 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Recommended from our members
Model-based Analysis of Oligonucleotide Arrays: Model Validation, Design Issues and Standard Error Application
Background: A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays. Results: Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available. Conclusions: The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis
On the characterization of non-negative volume-matching surface splines
AbstractIn this paper we study the surface spline which minimizes the Dirichlet Integral over a two-dimensional bounded domain, among all non-negative functions satisfying a finite number of volume-matching constraints. Existence and uniqueness of this surface spline are proved. A characterization by a variational inequality is given, revealing local and boundary behaviour of the surface spline. This characterization is of importance in the construction of numerical algorithms for the production of non-negative smooth surfaces from aggregated data
Reconstructing the energy landscape of a distribution from Monte Carlo samples
Defining the energy function as the negative logarithm of the density, we
explore the energy landscape of a distribution via the tree of sublevel sets of
its energy. This tree represents the hierarchy among the connected components
of the sublevel sets. We propose ways to annotate the tree so that it provides
information on both topological and statistical aspects of the distribution,
such as the local energy minima (local modes), their local domains and volumes,
and the barriers between them. We develop a computational method to estimate
the tree and reconstruct the energy landscape from Monte Carlo samples
simulated at a wide energy range of a distribution. This method can be applied
to any arbitrary distribution on a space with defined connectedness. We test
the method on multimodal distributions and posterior distributions to show that
our estimated trees are accurate compared to theoretical values. When used to
perform Bayesian inference of DNA sequence segmentation, this approach reveals
much more information than the standard approach based on marginal posterior
distributions.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS196 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …